vulnerability discovery
Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback
Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness, leading to redundant test cases and slow exploration of deep program states. In this work, I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback. Static analysis extracts control-flow and data-flow information, which is transformed into structured prompts for the LLM to generate syntactically valid and semantically diverse inputs. During execution, I augment traditional coverage-based feedback with semantic feedback signals-derived from program state changes, exception types, and output semantics-allowing the fuzzer to prioritize inputs that trigger novel program behaviors beyond mere code coverage. I implement our approach atop AFL++, combining program instrumentation with embedding-based semantic similarity metrics to guide seed selection. Evaluation on real-world open-source targets, including libpng, tcpdump, and sqlite, demonstrates that our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers. This work highlights the potential of combining LLM reasoning with semantic-aware feedback to accelerate and deepen vulnerability discovery.
LLM-based Vulnerability Discovery through the Lens of Code Metrics
Weissberg, Felix, Pirch, Lukas, Imgrund, Erik, Möller, Jonas, Eisenhofer, Thorsten, Rieck, Konrad
Large language models (LLMs) excel in many tasks of software engineering, yet progress in leveraging them for vulnerability discovery has stalled in recent years. To understand this phenomenon, we investigate LLMs through the lens of classic code metrics. Surprisingly, we find that a classifier trained solely on these metrics performs on par with state-of-the-art LLMs for vulnerability discovery. A root-cause analysis reveals a strong correlation and a causal effect between LLMs and code metrics: When the value of a metric is changed, LLM predictions tend to shift by a corresponding magnitude. This dependency suggests that LLMs operate at a similarly shallow level as code metrics, limiting their ability to grasp complex patterns and fully realize their potential in vulnerability discovery. Based on these findings, we derive recommendations on how research should more effectively address this challenge.
- Europe > Germany > Berlin (0.04)
- Asia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
Qiu, Le, Xu, Zelai, Tan, Qixin, Tang, Wenhao, Yu, Chao, Wang, Yu
Assessing the safety of autonomous driving policy is of great importance, and reinforcement learning (RL) has emerged as a powerful method for discovering critical vulnerabilities in driving policies. However, existing RL-based approaches often struggle to identify vulnerabilities that are both effective-meaning the autonomous vehicle is genuinely responsible for the accidents-and diverse-meaning they span various failure types. To address these challenges, we propose AED, a framework that uses large language models (LLMs) to automatically discover effective and diverse vulnerabilities in autonomous driving policies. We first utilize an LLM to automatically design reward functions for RL training. Then we let the LLM consider a diverse set of accident types and train adversarial policies for different accident types in parallel. Finally, we use preference-based learning to filter ineffective accidents and enhance the effectiveness of each vulnerability. Experiments across multiple simulated traffic scenarios and tested policies show that AED uncovers a broader range of vulnerabilities and achieves higher attack success rates compared with expert-designed rewards, thereby reducing the need for manual reward engineering and improving the diversity and effectiveness of vulnerability discovery.
- Asia > China > Beijing > Beijing (0.05)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (0.92)
- Automobiles & Trucks (0.92)
Will AI Make Cyber Swords or Shields: A few mathematical models of technological progress
Lohn, Andrew J, Jackson, Krystal Alex
Predicting the impact of advances in technology may be a fool's errand but it is a necessary one nonetheless to help try to guide research and funding toward efforts that benefit defense more than offense. For this paper, we try to mathematically model the impact of further advancement in several critical aspects of cybersecurity. Perhaps more importantly than any of the forewarnings or funding recommendations we come to, this approach strives to sharpen debates about AI's impact on cybersecurity. This is the companion paper for a separate report, published by CSET and titled, "Will AI Make Cyber Swords or Shields," illustrating the value of rigor in policy discussions about technological advancement. There is too much uncertainty to believe that the math gives precise projections, but it forces us to be precise in our assumptions. Reasonable people may disagree with the range of values we choose as inputs or even the models we use. We welcome those disagreements and hope they advance our collective understanding of how AI may change the future of cybersecurity. Following this introduction, we proceed with separate analysis from three areas of cybersecurity: 1) phishing, 2) vulnerability discovery, then 3) the dynamics between patching and exploitation.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
Towards Learning Representations of Binary Executable Files for Security Tasks
Arakelyan, Shushan, Hauser, Christophe, Kline, Erik, Galstyan, Aram
Tackling binary analysis problems has traditionally implied manually defining rules and heuristics. As an alternative, we are suggesting using machine learning models for learning distributed representations of binaries that can be applicable for a number of downstream tasks. We construct a computational graph from the binary executable and use it with a graph convolutional neural network to learn a high dimensional representation of the program. We show the versatility of this approach by using our representations to solve two semantically different binary analysis tasks -- algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results and demonstrate improvement on the state of the art methods for both tasks.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)